Module 1.4: Basic image processing is one of the largest (and perhaps most important) modules in all of the PyImageSearch Gurus courses. While the techniques you’ll learn inside this course are quite basic, they form the cornerstones of building more advanced computer vision and image processing algorithms.
For example, when we build custom image classifiers we’ll need to examine images at various scales (i.e. sizes) — and in order to obtain these various scales of an image we’ll need to understand how to resize an image.
After we have detected an object of interest using our custom image classifier, how we will go about extracting the object? Using cropping of course!
Later in this course we’ll explore how to extract features to represent and quantify the contents of an image. Feature extraction plays a vital role in image classification, image search engines, and many other sub-fields of computer vision. But there may be times where we only want to extract and quantify part of an image rather than all of it — and when this happens, we’ll need to understand bitwise operations and masking.
Again, the techniques you learn inside this module are not super advanced or challenging topics to master. But they are extremely important to understand when we move on to more advanced topics. Furthermore, these image processing topics are the cornerstones on which more advanced algorithms are built. It is more than likely that you’ll use one or more of these techniques inside your own computer vision applications.
Today, we’ll review the first of these important image processing techniques: translation.
Objectives:
To understand how to translate an image using OpenCV.
Translation
Translation is the shifting of an image along the x and y axis. Using translation, we can shift an image up, down, left, or right, along with any combination of the above.
Mathematically, we define a translation matrix that we can use to translate an image:
This concept is better explained through some code:
On Lines 1-5 we simply import the packages we will make use of. At this point, using numpy , argparse , and cv2 should feel commonplace. However, I am introducing a new package here: imutils . This isn’t a package included in NumPy or OpenCV. Rather, it’s a library that I personally wrote that contains a handful of “convenience” methods to more easily perform common tasks like translation, rotation, and resizing (and with less code).
If you are using the PyImageSearch Gurus virtual machine or if you followed the steps to creating your own custom development environment then the imutils package is already installed for you.
Otherwise, you can grab the source off GitHub or simply use pip to install it:
Anyway, after we have the necessary packages imported, we construct our argument parser and load our image on Lines 8-13. Below we can see our original image:
The first actual translation takes place on Lines 22-24, where we start by defining our translation matrix M. This matrix tells us how many pixels to the left or right our image will shifted, and then how many pixels up or down the image will be shifted.
Our translation matrix M is defined as a floating point array — this is important because OpenCV expects this matrix to be of floating point type. The first row of the matrix is , where
is the number of pixels we will shift the image left or right. Negative values of
will shift the image to the left and positive values will shift the image to the right.
Then, we define the second row of the matrix as , where
is the number of pixels we will shift the image up or down. Negative values of
will shift the image up and positive values will shift the image down.
Using this notation, on Line 22 we can see that and
, indicating that we are shifting the image 25 pixels to the right and 50 pixels down.
Now that we have our translation matrix defined, the actual translation takes place on Line 23 using the cv2.warpAffine function. The first argument is the image we wish to shift and the second argument is our translation matrix M. Finally, we manually supply the dimensions (width and height) of our image as the third argument.
Line 24 displays the results of the translation which we can see below:
Notice how the image has clearly be “shifted” down and to the right.
Moving on to Lines 28-30, we perform another translation. Here, we set and
, implying that we are shifting the image 50 pixels to the left and 90 pixels up. The image is shifted left and up rather than right and down because we are providing a negative values for both
and
.
The figure below shows the output of supplying negative values for both and
:
Again, notice how our image is “shifted” to the left 50 pixels and up 90 pixels.
However, manually constructing this translation matrix and calling the cv2.warpAffine method takes a fair amount of code — and it’s not pretty code either!
This is where the imutils package comes in. Instead of having to define our matrix M and make a call to cv2.warpAffine each time we want to translate an image, let’s define a translate convenience function that takes care of this for us:
Our translate method takes three parameters: the image we are going to translate, the number of pixels that we are going to shift along the x-axis, and the number of pixels we are going to shift along the y-axis.
This method then defines our translation matrix M on Line 7 and then applies the actual shift on Line 8. Finally, we return the shifted image on Line 11.
And again, this translate function is already part of the imutils package — there is no need for you to define this function yourself!
Anyway, let’s apply our translate method and compare to the methods discussed above:
Using our convenience translate method, we are able to shift the image 100 pixels down using a single line of code. Furthermore, this translate method is much easier to use — less code is required and based on the function name, we conveniently know what image processing task is being performed.
As you can see from the below image, the output is as expected — Our output image is translated shifted 100 pixels down:
Summary
In this section we explored how to shift an image up, down, left, and right. We were also introduced to the imutils package which contains a handful of convenient functions that make our lives easier when performing basic image processing operations.
Next up, we’ll explore how to rotate an image.